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1.  Introduction 


This  document  describes  a  program  to  compute  the  exponential  of  a  given  n  by  n  matrix  B 
multiplied  by  a  scalar  f  that  is  to  be  thought  of  as  representing  time.  Om  primary  goal  has  been 
to  achieve  as  much  accuracy  as  working  precision  permits  without  resorting  to  simulated  higher 

/ACy 

precision.  The  final  product  is  more  complicated  than  we  anticipated  at  the  outset.  How  these 
complications  came  to  be  accepted  is  the  theme  of  this  story.  The  cases  we  considet^nay  be  of 
interest  to  those  who  wish  to  use  the  matrix  exponential  in  their  work.  We  hasten  to  add  that 


the  code  acts  simply  on  simple  cases. 

Th.'s 

•Our  code  is  unlikely  to  be  the  last  word  on  the  computation  of  exp(Zft)  since  speed  and  sim- 

fi\‘ i  JUkCu***  t 

plicity  deserve  some  recognition.  For  simplicity  and  generality  we- considerably  complex  matrices 
and  only  those  whose  order  n  is  small  enough  that  four  n  by  n  arrays  fit  comfortably  in  the  fast 
store.  In  practice  we  expect  to  have  n<100.  Recall  that  unless  B  has  special  structure,  such  as 

being  block  diagonal,  its  exponential  will  have  no  zero  elements,  .' 

.  £TSffidc  A'1 — —  v — - 

The  rest  of  the  paper  is  organized  in  the  way  indicated  inHhe(jable  of  contents.  We  use 

* 

standard  conventions:  capital  letters  for  matrices,  lower  case  for  column  vectors,  and  lower  case 
Greek  for  scalars.  In  particular  we  switch  to  r  for  time  instead  of  t. 


The  designer  of  a  program  to  invert  a  matrix  has  to  decide  what  action  should  be  taken 
when  a  given  matrix  is  close  to  singular.  Should  an  indication  of  sensitivity  be  given  along  with 
some  output?  Although  the  exponential  of  a  matrix  is  always  defined  it  may  be  exceedingly  sensi¬ 
tive  to  tiny  perturbations  of  the  original  matrix.  Ideally  a  program  such  as  ours  should  compute 
some  measure  of  the  sensitivity  of  the  data.  We  have  such  a  facility  but,  for  the  sake  of  brevity, 
we  shall  present  it  in  another  communication. 
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2.  Background 


Prior  to  the  1960’s  the  task  of  computing  the  eigenvalues  of  a  general  matrix  was  considered 
difficult.  Consequently  a  number  of  methods  that  avoid  similarity  transformations  were  tried. 
Examples  are 


(a) 


y  (flr)* 
&  '*! 


for  some  suitable  value  of  m, 


(b)  r (Br)  ,  where  r  is  a  rational  approximation  to  exp.  The  near-diagonal  Pade  approximations 
are  favorites.  See  [Wragg  and  Davies]  and  the  reference  list  of  [Moler  and  van  Loan]. 

(c)  Ev , where  E  is  the  result  of  (a)  applied  to  (Bt)/ 21 . 


A  potential  limitation  of  these  methods  is  that  they  do  not  give  the  exact  solution  in  exact 
arithmetic  augmented  with  an  exact  exponential  function.  Consequently  approximation  error 
must  be  balanced  against  roundoff  error.  For  example,  straight  forward  use  of  the  exponential 
series  can  yield  some  nasty  surprises.  See  (Moler  and  van  Loan]. 

The  attraction  of  similarity  transformations  is  that 

/  ( GBG~l)=  Gf(B)  G~l 

for  any  analytic  function  /.  In  particular,  if  B  can  be  diagonalized  in  a  stable  way,  say 
GBG~l  =  D  =  diagonal,  then 

exp(Bt)  »  G~xexp(Dt)G. 

If  cond(G)  (=||  G||  ||  C/-1||  )  <  100  this  is  an  attractive  method.  However  not  all  matrices  can 
be  diagonalized.  Since  the  effect  of  round  off  is  proportional  to  cond(C)  it  is  important  to  use 
only  well  conditioned  similarities.  That  is  one  of  the  reasons  why  the  Jordan  canonical  form  is 
much  less  useful  in  matrix  computations  than  in  matrix  theory.  The  similarities  that  put  B  in 
Jordan  form  may  be  arbitrarily  ill-conditioned.  The  other  count  against  this  form  when  comput¬ 
ing  exponentials  is  worth  mentioning.  Sometimes  it  is  overkill.  Imagine  a  complicated  Jordan 
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structure  associated  with  an  eigenvalue  -100  when  there  are  other  eigenvalues  of  magnitude  -1. 
After  exponentiation  the  contribution  of  the  block  associated  with  -100  to  the  whole  will  be  negli¬ 
gible  and  there  is  no  need  to  expend  considerable  effort  to  get  that  Jordan  structure  correct.  To 
put  it  briefly,  it  is  in  general  more  difficult  to  compute  the  Jordan  form  than  to  compute  the 
exponential. 

Fortunately  there  is  a  canonical  form  that  can  always  be  obtained  in  a  stable  manner.  Any 
square  matrix  may  be  put  into  upper  triangular  form  by  a  sequence  of  unitary  similarity  transfor¬ 
mations.  Unitary  matrices  have  the  smallest  possible  condition  number,  namely  1,  and  are  the 
analogues  for  complex  matrices  of  the  familiar  orthogonal  matrices  such  as  plane  rotations  and 
Householder  reflections.  The  reason  that  the  Schur  form  5  plays  little  role  in  matrix  theory  is 
that  it  is  not  unique.  However  once  the  order  of  the  eigenvalues  is  fixed  on  the  diagonal  then  the 
absolute  value  of  each  off  diagonal  element  is  fixed.  That  is  good  enough  for  practical  work. 

There  are  many  other  ways  to  approximate  the  exponential  of  a  matrix.  An  excellent  and 
informative  description  is  given  in  [Moler  and  Van  Loan].  To  our  knowledge  none  of  them  is  suit¬ 
able  for  the  general  case.  We  discuss  an  algorithm  suggested  by  G.W.Stewart  in  a  later  section 
but  it  is,  like  ours,  based  on  the  Schur  form. 


3.  An  Ideal  Algorithm  and  its  Flaw 


Nowadays,  thanks  to  the  QR  algorithm  (cf  EISPACK  routines),  the  reduction  of  a  matrix  to 
Schur  (upper  triangular)  form  is  safe  and  easy.  Only  unitary  similarity  transformations  need  be 
used  and  this  confers  stability  on  the  process.  Moreover  special  2  by  2  unitaries  can  be  employed 
to  swap  adjacent  diagonal  elements  of  the  Schur  form  5  (these  are  necessarily  eigenvalues  of  B ) 
without  destroying  triangular  form.  Thus  the  eigenvalues  may  be  put  in  any  convenient  order 
down  the  diagonal  5.  The  attraction  of  this  reduction  is  that  once  the  diagonal  elements  of  5 
have  been  exponentiated  the  upper  triangular  elements  of  ezp(Sr)  are  rational  functions  of  them 
and  would  be  calculated  exactly  in  exact  arithmetic. 

There  are  several  ways  to  exponentiate  a  triangular  matrix.  Here  is  one.  Impose  a  parti* 
tioning  on  5  into  block  form  that  is  as  fine  as  possible  subject  to  the  constraint  that  no  two  diago¬ 
nal  blocks  have  spectra  that  are  close.  Thus  all  eigenvalues  in  a  tight  cluster  must  belong  to  the 
same  block  but  a  block  may  contain  eigenvalues  that  are  quite  far  apart.  A  picture  is  given 
below. 

XXX 

XX . 

X . 

XX  - 
5  ”  X  ■  ■  • 
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X  X 
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It  is  easy  to  verify  that  exp  (St)  has  the  same  block  structure  as  5.  Moreover  there  is  a  sim¬ 
ple  recurrence  that  permits  the  computation  of  the  off  diagonal  blocks  once  the  diagonal  blocks  of 
E  as  exp(Sr)  are  known.  This  result  was  presented  in  [Parlett,  1973) .  For  j <k, 

k- 1 

~  Ej.kSkk  —  E}jSj,k  -  S;,kEu  +  J]  (  SitkEUi  -  Et,kSul  ). 

<— j+i 


When  k»j  +  1  then  the  recurrence  yields  EJJ+l  as  a  function  of  the  neighboring  diagonal  blocks 
of  E.  Similarly  for  each  k>j+  1  the  blocks  on  the  (k-j) th  diagonal  are  determined  by  the 


blocks  that  are  closer  to  the  maim  diagonal  and  lie  in  the  corresponding  row  and  column.  Thus 
E  can  be  built  up  from  the  diagonal  outwards.  Since  the  are  triangular  the  elements  of  £;>t 
may  be  found  sequentially  from  the  bottom  left  element  upwards.  The  partitioning  ensures  that 
these  systems  of  equations  are  well  conditioned.  The  procedure  based  on  this  recurrence  is  simple, 
stable,  and  fast.  So  the  problem  has  been  reduced  to  the  computation  of  exp(rS;;),  for  each  j, 
and  deciding  on  the  block  structure. 

Now  the  exponential  series  is  an  attractive  method  precisely  when  all  the  eigenvalues  are 
clustered.  For  example,  if  all  eigenvalues  are  equal  then  the  series  (expanded  around  the  eigen¬ 
value)  must  terminate  at  the  nth  term  at  the  latest.  In  general  we  would  expand  about  the  mean 
H  of  the  eigenvalues  in  the  block.  Convergence  is  easy  to  monitor.  One  updates  (tSu  - 
from  (rS;;  -  /*/)*  and  computes  its  norm  in  the  process.  If  the  norm  is  small  enough  one  stops, 
otherwise  one  adds  the  next  term. 

This  ideal  algorithm  will  be  accurate  and  efficient  in  many  cases  but  it  is  not  adequate  for 
the  general  case.  The  flaw  appears  in  the  use  of  the  exponential  series  for  the  diagonal  blocks  of 
St.  There  is  no  guarantee  that  the  eigenvalues  of  a  block  are  close  to  their  mean.  In  other  words 
the  conditions  when  the  recurrence  is  accurate  and  the  conditions  when  the  exponential  series 
converges  satisfactorily  do  not  cover  all  possibilities.  Moreover  the  cases  for  which  neither  tech¬ 
nique  is  satisfactory  are  quite  numerous  and  not  at  all  pathological.  Our  gradual  realization  of 
this  unpleasant  fact  lead  to  a  ten  year  retardation  in  our  plans  for  producing  a  program  for  com¬ 
puting  the  matrix  exponential.  This  also  follows  from  the  fact  that  point  sets  in  the  complex 
plane  cannot  always  be  nested.  Moreover  the  larger  the  number  of  terms  used  in  the  series  the 
greater  the  effect  of  roundoff  and  the  greater  the  danger  of  having  a  hump.  A  hump  occurs  when 
one  of  the  partial  sums  of  the  series  is  much  larger  than  the  final  sum.  The  round  off  error  is  pro¬ 
portional  to  the  norm  of  the  greatest  partial  sum. 

An  ordered  set  of  complex  values  zvzv,zit..  is  nested  if,  for  any  indices  satisfying 
i<j<k<l,  |  tj  -  |  <  |  z,  -  zi  | .  An  example  of  a  set  that  cannot  be  nested  is  a  set  of 


uniformly  spaced  points  on  a  circle  in  the  complex  plane.  In  the  algorithm  given  above  if  there 
are  enough  eigenvalues  on  the  circle  then  they  must  all  go  in  the  same  block  but  the  radius  of  the 
circle  may  be  large. 

If  the  eigenvalues  can  be  nested  then  such  an  ordering  should  be  used  for  arranging  them 
along  the  diagonal  of  S.  However  even  if  the  eigenvalues  are  linearly  ordered  the  spacing  between 
them  may  dictate  that  they  all  should  go  int<  the  same  block  despite  a  large  gap  between  the  first 
and  last  of  them.  Example  II  in  the  next  section  gives  an  instance  in  which  the  recurrence  (using 
lxl  blocks)  is  too  inaccurate  and  the  series  is  too  slow.  Note  that  for  that  example  there  is  no 
better  expansion  point  for  this  series  than  0. 


4.  Illustrative  Examples 


The  four  examples  in  this  section  reveal  different  aspects  of  the  task  of  computing  the 
matrix  exponential.  These  computations  were  done  on  a  VAX/750  using  single  precision  (com¬ 
plex)  arithmetic.  In  the  followings  tables,  cexphy*  denotes  the  method  we  use  in  our  matrix 
exponential  program  (see  section  6  and  7). 


Example  I 

0  30  1  1  1  1 

-100  01111 
0  0  0  -6  1  1 

Z  ~  r  0  0  500  0  1  1 

0  0  0  0  0  200 

,0  0  0  0  -15  0 


n 

Method 

Norm 
rel.  error 

Max.  (Element-wise) 
rel.  error 

Timing 

Remark 

0.01 

2.11E-8 

1.92E-7 

.400  sec 

11  terms 

cexphy 

7.32E-7 

7.86E-7 

.250  sec 

0.1 

KWnWErnnn 

6.63E-7 

1.22E-5 

.867  sec 

29  terms 

cexphy 

8.77E-7 

3.27E-6 

.217  sec 

1 

2.72E15 

2.51E16 

4.08  sec 

130  terms 

cexphy 

5.60E-6 

4.20E-5 

.250  sec 

10 

- 

- 

cexphy 

3.15E-5 

3.49E-4 

.300  sec 

100 

- 

- 

- 

cexphy 

3.35E-4 

6.96E-3 

.383  sec 

Table  4.1 


The  small  integer  matrix  Z  looks  innocent  but  its  spectrum  consists  of  two  triple  eigenvalues 
near  +  54. 8i  and  -54. 8i.  Is  this  case  difficult? 

The  answer  is  yes  in  real  arithmetic  but  no  in  complex  arithmetic!  In  the  real  case  no  block 
diagonalization  is  possible  since  the  blocks  have  identical  spectra.  That  leaves  only  the  series  on 

*  cexphy  is  the  n»me  of  i  subroutine  in  the  Appendix. 
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the  whole  of  Z  for  the  "ideal”  algorithm  of  Section  3.  The  diameter  of  Z’s  spectrum  is  109.6 
which  suggests  that  many  terms  will  be  required  in  the  series. 

Now  consider  rZ.  When  r  is  small  (0.01)  then  the  diameter  is  quite  small  too  and  only  11 
terms  are  needed  for  the  series  to  stagnate.  As  r  increases  to  10  and  beyond  the  computation  is 
aborted  by  exponent  overflow.  A  partial  sum  exceeds  the  narrow  range  of  the  VAX/780  (-38  to 
+  38  in  decimal  base)  but  even  on  other  machines  the  computed  sum,  though  very  expensive,  is 
worthless:  it  contains  no  correct  digits.  So  overflow  was  merciful  in  this  instance. 

In  complex  arithmetic  the  two  eigenvalues  can  be  put  into  different  blocks.  These  3  by  3 
blocks  are  easily  exponentiated  (Taylor  series  is  almost  the  same  as  the  Newton  polynomial  in 
this  case)  and  the  rest  is  filled  in  by  the  recurrence.  Element  relative  accuracy  is  excellent  until 
r=l  but  even  for  r=100  at  least  half  the  digits  in  each  element,  even  the  smallest,  are  correct. 
The  execution  time  grows  only  logarithmically  with  the  norm  of  the  matrix. 

The  reason  that  execution  is  quicker  with  r=0.1  than  with  r=0.01  is  that  in  the  latter  case 
the  eigenvalues  are  judged  too  close  to  separate  and  one  6  by  6  block  is  slower  than  two  3  by  3 
blocks. 
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Example  Q 


-15»  -58  1  1 

—14*  -54  1 
—13*  -50 


1  1 
1  1 
1  1 
1  1 
1  1  ’ 
13*  54  1 

14*  58 
15* 


f— n/=T. 


Method 

Norm 
rel.  error 

Max.  (Element-wise) 
rel.  error 

Timing 

Remark 

Taylor  Series 

3.29E-3 

7.42E-1 

126.  sec 

58  terms 

Recurrence 

4.76E-2 

2.68 

1.15  sec 

cexphy 

8.23E-6 

5.85E-4 

23.1  sec 

Table  4.2 


This  is  a  clear  example  in  which  neither  the  series  nor  the  recurrence  is  adequate.  Diagonal- 
ization  also  fails  in  the  same  way  as  the  recurrence.  The  series  is  too  slow  (because  the  block’s 
spectrum  has  diameter  30)  and  the  recurrence  is  too  inaccurate  despite  a  separation  of  1  between 
the  eigenvalues.  Our  algorithm  gets  the  smallest  element  of  the  exponential  to  4  digits  accuracy. 
It  is  5  times  quicker  than  the  series  but  20  times  slower  than  the  recurrence. 


Method 

Norm 
rel.  error 

Max.  (Element-wise) 
rel.  error 

Timing 

Recurrence 

5.48E-8 

1.20E-3 

0.63  sec 

Taylor  Series  without  shift 

1.48E-7 

1.48E+2 

23.7  sec 

Taylor  Series  with  shift»-12 

1.42E-7 

7.38E-7 

32.3  sec 

cexphy 

5.73E-8 

2.34E-6 

14.3  sec 

Table  4.3 

This  example  illustrates  two  things.  The  first  is  the  fairly  well  known  rule  that  it  is  not 
always  safe  to  expand  the  Taylor  series  around  the  mean  of  the  eigenvalues  of  the  matrix.  How¬ 
ever  even  with  this  shift  Taylor  series  and  all  the  other  methods  give  perfect  accuracy  provided 
that  you  judge  error  relative  to  the  norm  of  the  exponential  matrix.  This  permits  small  elements 
to  be  quite  wrong.  This  may  not  matter  in  some  applications  where  the  exponential  of  the  Schur 
form  must  be  multiplied  on  each  side  by  a  fairly  dense  unitary  matrix  that  will  smear  out  the 
errors  in  the  elements  of  exp(rS).  On  the  other  band  our  algorithm  and  the  slow  form  of  the 
series  get  all  the  elements  correct.  (In  this  example,  the  Taylor  series  run  faster  then  expected 
because  it  took  advantage  on  the  bi-diagonal  structure  of  Z.) 
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Example  IV 


Let  (~-)r  denote  the  quotient  rounded  to  single  precision  floating  point  format. 
0  b 

0  2  -2  (8/3)f  •  •  •  (2®/9)r  1 

02  -2  •  •  • 

0  2  •  •  • 


Method 


cexph 


Norm 
rel.  error 

Max.  (Element-wise) 
rel.  error 

Timing 

1.01E-5 

5.28E+  1 

1.05  sec 

1.01E-5 

5.28E+  1 

.553  sec 

11  terms 


Table  4.4 


This  example  is  a  contrived  one  designed  to  defeat  our  program.  The  exponential  of  this 
matrix  is  very  close  to  the  Jordan  block  which  has  eigenvalue  1  on  the  diagonal  and  2  on  the 
superdiagonal.  Both  Taylor  series  and  cexphy  yield  almost  identical  results  but  cexphy  is  faster. 
Although  the  error  is  fairly  small  relative  to  the  norm  of  the  solution  our  algorithm  did  not  get 
the  small  elements  with  any  accuracy.  However  our  estimator  of  the  sensitivity  of  this  matrix  to 
exponentiation,  and  indeed  any  reasonable  estimator,  declares  that  the  small  elements  are  exceed¬ 
ingly  sensitive  to  any  perturbations. 
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5.  Relation  to  Block  Diagonalization 


In  [7|  Stewart  proposed  a  way  to  compute  the  matrix  exponential  that  is  also  based  on  the 
Schur  form.  He  uses  extra  similarity  transforms  to  reduce  S  to  block  diagonal  form  with  as  many 
blocks  as  is  consistent  with  keeping  the  condition  number  of  the  transforms  bounded.  An  imple¬ 
mentation  of  Stewart’s  approach  has  been  made  by  [8|.  From  the  point  of  view  of  backward  sta¬ 
bility  analysis  there  is  little  harm  in  allowing  a  condition  number  of  10  or  even  100.  He  proposed 
to  exponentiate  the  diagonal  blocks  that  remain  by  using  the  series  expansion  around  the  mean  of 
each  block’s  eigenvalues.  His  algorithm  differs  from  the  ideal  one  described  in  the  previous  section 
only  in  the  replacement  of  the  recurrence  formula  by  explicit  reduction  to  block  diagonal  form. 


We  shall  show  below  the  surprising  result  that  there  are  fundamental  connections  between 
the  two  approaches.  Indeed,  in  one  sense,  they  are  equivalent.  The  attractive  aspect  of  Stewart’s 
approach  is  that  the  reduction  is  independent  of  r.  Consequently  as  r  changes  it  is  only  necessary 
to  recompute  the  diagonal  blocks.  However  the  hidden  difficulty  is  to  decide  on  how  large  a  condi¬ 
tion  number  to  allow  for  the  additional  similarities.  Stewart  wanted  to  keep  the  cutoff  indepen¬ 
dent  of  the  matrix  or  r.  The  ideal  algorithm  uses  dynamic  grouping  to  determine  the  block  struc¬ 
ture.  The  criterion  depends  on  the  value  of  r.  The  following  example  shows  that  it  may  not  be 
wise  to  ignore  r. 

Example. 


Thus 


Diagonalization  forces  the  (1,2)  element  to  be  computed  by  an  explicit  subtraction  that  will 


3 


sacrifice  information  when  r  is  small.  The  recurrence  will  be  just  as  bad.  However  by  keeping  the 
matrix  as  a  2  by  2  block  one  can  use  formulae  or  other  techniques  that  yield  the  (1,2)  element 
accurately  for  all  r.  However  it  must  be  said  that  the  error  made  using  diagonalization  or  the 
recurrence,  though  avoidable  and  relatively  large  when  r  is  tiny,  is  nevertheless  at  the  roundoff 
level  compared  with  |j  exp(TB)\  \  .  So,  from  the  point  of  view  of  backward  error  analysis  it  is 
acceptable.  ' 

The  next  example  shows  that  the  requirement  of  keeping  the  condition  number  small  can  be 
too  cautious.  Both  diagonalization  and  the  recurrence  are  very  accurate  in  exponentiating 

r[l0r*  HT8)  ■  where  r=  104  • 

There  is  more  to  say  on  this  topic.  Block  diagonalization  is  regarded  as  a  simplification  of 
the  task  that  can  be  effected  independent  of  r  However  when  r  is  tiny  then  it  is  preferable  to  to 
treat  the  matrix  S  as  one  block  rather  than  to  implement  extra  similarity  transformations.  In 
these  cases  the  Newton  polynomial  subroutine  (see  next  section)  will  return 

/+  rS  or  /+  rS+  (1/2)t2S2 

after  computing  the  coefficients  and  checking  that  the  remainder  of  them  are  negligible. 

At  the  other  extreme,  when  r  is  large,  then  it  is  safe  to  use  diagonalization  even  if  the  asso¬ 
ciated  similarity  transformations  are  ill  conditioned.  This  is  because  the  relative  accuracy  of  the 
off  diagonal  terms  of  the  exponential  depends  only  on  the  eigenvalue  separation,  not  on  the  other 
elements  of  S.  See  [3]  for  the  analysis. 

To  see  the  relation  between  this  reduction  and  the  Recurrence  it  suffices  to  consider  5  parti¬ 
tioned  into  two  blocks  as  shown 


5  =  I  0  S2 

where  5 j  and  S2  are  square.  Provided  that  their  spectra  are  disjoint  a  reduction  can  be  effected 
by  a  simple  similarity 


14 


Sx  0 

0  s2 


where  R  satisfies 


SiR  —  RS2  — 


The  spectral  norm  of  the  reducing  matrix  is  \/l  +  ||  R  ||  2  and  the  same  holds  for  its 
inverse.  Consequently  the  condition  number  of  the  similarity  is  1  +  1 1  R  |  |  2.  It  would  certainly 
be  reasonable  to  use  such  transformations  with  R  satisfying  j  j  R  ||  <3  but  ||  R  ||  =  100  would 
invite  unnecessary  loss  of  accuracy.  Please  note  that  our  expressions  are  not  mere  bounds,  they 
are  exact.  It  should  be  pointed  out  at  this  point  that  there  are  better  conditioned  transformations 
that  block  diagonalize  5  but,  unfortunately,  they  do  not  preserve  the  triangular  form  of  Si  and 
S2.  See  [Demmel] 


In  practice  the  exponential  E=exp(Sr )  will  be  kept  in  factored  form.  So  the  (1,2)  block  of 
E  will  only  be  computed  if  E  itself  is  wanted  and  in  that  case 

Ei 2  *  EiR  —  RE2. 

At  this  point  it  is  convenient  to  introduce  a  little  new  notation.  The  right  hand  side  of  the 
above  equation  is  a  linear  expression  in  R  and  this  expression  is  sometimes  called  the  Sylvester 
operator  associated  with  Ex  and  E2.  so  we  introduce 


Definition. 


Consequently 


SylBM  =  EXM  -  ME2. 


Ex2  =  SylER  and  SylsR  =  S12. 

On  the  other  hand  the  recurrence  (see  Section  3)  yields 

SylsEx2  **  SylESX2. 

Now  observe  that  E,  is  a  function  of  5,,  so  E,  commutes  with  5,  for  i = 1,2.  Consequently  SylE 
commutes  with  Syl$.  The  verification  is  left  as  an  exercise  for  the  reader.  Moreover,  since  the 
spectra  of  Sx  and  S2  are  disjoint  Syls  is  invertible  and  its  inverse  must  also  commute  with  SylE. 


In  other  words 


Via  reduction:  E SylE Syl$l S l2, 

Via  recurrence:  E  12=  Sylg1  SylE  S 12. 

The  commutativity  of  the  two  Syl  operators  guarantees  the  same  answer.  In  our  case  both  Syl 
operators  are  triangular  in  nature  and  so  El2  is  obtained  from  Si2  by  two  backsoives! 

Of  course  the  preceding  analysis  was  in  the  context  of  exact  arithmetic  but  it  does  give  con¬ 
siderable  insight  into  the  properties  of  the  actual  algorithms  executed  in  noisy  arithmetic.  For 
example,  any  error  in  E,  will  get  propagated  to  Ex2  through  the  operator  Syis1.  Hence  the  need 
for  well  separated  spectra  in  Sj  and  S2. 

Let  us  contrast  block  diagonalization  with  the  recurrence.  An  advantage  of  reduction  is 
that  R  is  computed  explicitly  and  so  its  norm  will  be  known  and  can  be  judged  during  the  com¬ 
putation.  An  advantage  of  the  recurrence  is  that  El2  is  computed  explicitly  and  uses  inner  pro¬ 
ducts.  So  there  is  the  possibility  of  using  a  function  that  employs  accumulation  of  inner  products 
with  extra  precision.  However,  given  the  partition,  the  two  methods  are  essentially  the  same.  In 
particular  they  both  face  the  difficult  task,  to  which  the  rest  of  this  communication  is  addressed, 
of  computing  reliably  E,,  i— 1,2. 


1ft 


6.  Outline  of  actual  program 

Justification,  comments,  and  implementation  tricks  will  be  given  in  the  sections  that  follow. 
The  algorithm  uses  explicit  similarities  on  a  given  complex  matrix  B.  The  goal  is  to  form 
ezp(Br)v  for  one  or  more  vectors  v  and  one  or  more  values  of  the  scalar  parameter  r  (time).  It  is 
convenient  to  keep  ezp(Br)  in  factored  form. 

(1]  Compute  the  Schur  form  S  of  B:  B=PSP‘.  P'=P~l,  i.e.  P  is  unitary.  5  is  upper  tri¬ 
angular.  Eigenvalues  are  arranged  down  the  diagonal  in  increasing  order  by  real  part. 


[2]  Impose  a  block  structure  on  5  such  that  the  recurrence,  given  in  Section  3,  will  determine 
the  off  diagonal  blocks  with  high  accuracy.  Recall  that  the  eigenvalues  in  the  diagonal 
blocks  may  or  may  not  be  tightly  clustered. 


[3]  Apply  the  following  sequence  of  steps  to  each  diagonal  submatrix  S}}  to  form 
£'};=exp(r5JJ).  These  blocks  may  be  processed  in  turn  or  in  parallel.  In  many  instances 
only  a  subset  of  the  following  items  will  be  used. 

(a)  Subdivide  the  eigenvalues  of  S;;  into  tight  clusters  if  possible.  Here  is  an  example. 
Let  i2  -1. 


'n 


-0.1 


40. 2» 


40.4i 


-30.0» 


-30.3» 


40 


0» 


Cluster  1:  40.4i,  40. 2i,  40.0i 
Cluster  2:  -0.1,  0.1 
Cluster  3:  -30.0i,  -30.3i 

(b)  Apply  unitary  similarities  to  S„  to  reorder  the  eigenvalues  into  the  clusters  found  in 
(a).  Say  S„ where  Q'-S1. 


R 


a 


40. 2 »  *  *  *  *  * 

40.0  *  *  *  * 

-0.1  *  *  * 


0.1  *  * 

-30. Of  * 

30. 3i 


/?//>  *  * 
R«>  • 

Rjf> 


(c)  Each  Rjfr  is  exponentiated  using  the  Newton  form  of  the  interpolating  polynomial 
on  RL^t’s  eigenvalues.  If  there  are  fewer  than  3  eigenvalues  then  explicit  formulae  are 
available  (see  the  remark  below). 


exp(R^T)  =  eX‘f  I  +  £  7m r""1  fi -  V)- 

No  extra  storage  is  needed  for  the  partial  products.  See  later  notes.  The  divided  differences 
7m  are  computed  by  a  mixture  of  a  (scaling  +  squaring)  technique,  described  as  method  (c) 
of  Section  2,  and  the  standard  recurrence.  Note  that  there  may  be  no  tight  clusters  and  in 
such  cases  the  polynomial  is  evaluated  at  rSj; . 

(d)  Use  the  recurrence  for  the  off  diagonal  blocks  of  R,  if  any. 

(e)  Reverse  the  similarities  made  in  (b). 

Remark.  Formula  for  two  by  two  triangular  matrix.  Let 


Then 


,  *i+  h , 


by-b  2 


exp(rB)  =* 


er>1  r6seu'-sin(i»/(j» 


[4]  Apply  the  recurrence  to  compute  the  off  diagonal  blocks  of  E. 


[5]  For  each  given  v  compute  P(E(P‘v)). 


7.  Discussion  of  Program 


[a]  Balancing. 

For  our  purposes  there  is  a  serious  flaw  in  the  EISPACK  subroutine  BALANC.  It  ignores 
the  diagonal  element  when  computing  the  norm  of  a  row  or  column  of  a  matrix.  Although  a 
diagonal  similarity  transformation  using  powers  of  the  base  of  the  arithmetic  system  causes 
no  roundoff  errors  at  all  it  may  be  ill-conditioned.  In  fact  when  balancing  is  most  useful 
then  the  similarity  will  be  ill-conditioned.  The  purpose  of  such  transforms  is  to  reduce  the 
norm  of  the  matrix  and  thereby  to  make  the  reduction  to  Schur  form  more  accurate  because 
norms  are  used,  both  explicitly  and  implicitly,  to  judge  whether  certain  transforms  are 
necessary.  For  the  purpose  of  computing  eigenvalues  these  transforms  are  beneficial  but  we 
are  concerned  about  the  effect  of  the  back  transformations,  at  the  very  end,  when  some 
function  of  the  matrix  has  been  computed.  We  have  not  made  extensive  tests  with  Balanc¬ 
ing  and  consequently  have  not  included  it  in  our  current  implementation.  Here  is  an  exam¬ 
ple  of  an  unfortunate  transformation  by  BALANC.  Given  matrix  B,  let  C  be  the  balanced 
B. 

1  0  1£-10  0 

_  0  2  0  IE-10 

fl=  3  1  3  0 

4  5  1  4 

After  BALANC  (only  three  digits  are  shown) 

1  0  8.19E-7  0 

0  2  0  4.19E-4 

C  *  3.66E-4  2.44E-4  3  0 

4.77E-7  1.19E-6  9.77E-4  4 


where 
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B  =  D  C  IT1,  D  = 


Apply  Scfaur  decomposition  to  B  and  C  to  obtain  their  Schur  factors  Pb,  Sj  and  Pc,  5t. 
On  a  Vax  750  using  single  precision  arithmetic,  we  have 
||  B-PbShP; ||  =7.78£-6, 

||  B-DPf,  Su  Pj'Z)_1||  =2.67. 


Please  note  that  if  the  diagonal  elements  were  used  in  computing  row  and  column  norms  in 
Balancing  then  B  would  be  changed  very  little. 


[b]  Criterion  for  Clustering. 

When  the  Schur  form  is  real  with  real  eigenvalues  on  the  diagonal  (and  r> 0),  [3|  showed  a 
way  to  determine  the  clustering  that  guarantees  tight  bound  on  the  accumulated  error  in 
running  the  recurrence.  We  quote  the  result  here. 

Criterion  R. 

(i) .  Order  the  diagonals  (eigenvalues)  X,  from  left  to  right,  i.e.,  X,<X;  if  i<j. 

(ii) .  If  r|  X,-X;  |  < g(  |  i-j  | )  then  X,  and  X;  must  go  into  the  same  cluster.  Here 

1+-^),  *>o. 

It  is  not  clear  how  to  extend  criterion  R  to  complex  numbers.  A  tentative  solution,  proposed 
in  [3|,  is  called  matrix  argument  reduction  We  describe  it  briefly.  It  replaces  rS  by  a 
different  matrix  G  that  has  the  same  exponential  as  rS  but  all  of  whose  eigenvalues  lie 
within  it  of  the  real  axis.  Then  Criterion  R  may  be  applied  to  G.  This  often  works  well 
but  there  are  cases  where  G  is  not  computed  accurately  enough  in  finite  precision  arith¬ 
metic.  Further  analysis  shows  that  matrix  argument  reduction  is  equivalent  to  clustering 
the  eigenvalues  of  5  by  their  imaginary  parts  without  regard  to  their  real  parts. 


We  have  abandoned  explicit  argument  reduction  but  have  used  it  implicitly  in  designing  our 
Criterion  C  for  clustering.  For  each  dimension  k  a  bidiagonal  matrix  Z,  ( 
2, f,  =  r-(i-k/2),  2t,,+i  =  »)  is  used  to  determine  the  smallest  rsuch  that  the  recurrence  for¬ 
mula  still  gives  accurate  answers,  and  define  gt  =  r*k.  Then  we  interpolate  these  gk  by  a 
second  degree  polynomial  on  positive  integers  k.  Thus,  we  arrive  at  the  following  two  level 
clustering  criterion. 

Criterion  C. 

(i) .  Order  the  eigenvalues  according  to  their  real  parts  (from  negative  to  positive). 

(ii) .  Apply  Criterion  R  to  determine  clusters  using 

j(/fc)=0.2  +  2{k  -  1)  +  0.03*2. 

(iii) .  For  each  cluster,  reorder  the  eigenvalues  by  their  imaginary  parts.  When  these 

differ  by  less  than  tt  the  eigenvalues  are  put  into  the  same  cluster. 

We  have  concluded  that  the  block  structure  in  Step  3  must  be  a  function  of  t.  The  reason 
is  independent  of  our  use  of  the  recurrence.  The  relative  sensitivity  of  the  (j,k)  block  of  E 
depends  on  the  separation  of  the  eigenvalues  of  E „  and  Eu  and  this  will  vary  with  r.  The 
plausibility  of  this  decision  may  be  seen  from  a  consideration  of  extreme  cases.  When 
r  «  ||  R||  ,  say  10"*||  fl||  ,  then  it  will  be  efficient  to  consider  the  matrix  as  one  block 
and  use  a  few  terms  of  the  interpolating  polynomial.  On  the  other  hand  when  r  is  large,  but 
still  small  enough  to  make  E  computable,  then  the  recurrence  can  be  used  for  most  of  the 
off  diagonal  elements.  Our  code  allows  negative  (even  complex!)  values  for  t  and  a  change 
of  sign  certainly  warrants  a  new  partitioning. 

We  had  at  least  hoped  to  keep  the  ordering  of  the  eigenvalues  of  the  Schur  form  indepen¬ 
dent  of  r  but  Step  3  of  section  6  shows  that  we  failed  for  the  general  complex  case.  However 
please  note  that  this  fact  is  hidden  in  Step  3.  At  the  level  of  Steps  1,2,4,  5  the  Schur  form 
is  fixed.  By  accepting  the  burden  of  a  possible  reordering  at  Step  3  we  can  obtain  high  rela¬ 
tive  accuracy  whenever  the  data  warrants  it. 
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Let  us  consider  an  objection  to  Step  3.  It  seems  likely  that  as  r  varies  the  reordering  in  3(a) 
of  section  6  will  turn  out  to  remain  the  same.  If  this  is  the  case  then  the  unitary  swaps 
could  have  been  done  initially  at  Step  2.  The  flaw  in  this  argument  is  that  the  partitioning 
may  be  changing  with  r  and  an  ideal  ordering  may  be  very  difficult  to  determine  in  advance. 

[c]  The  Interpolating  Polynomial. 

We  regard  this  procedure  as  an  efficient  form  of  the  exponential  series.  It  requires  at  most 
m-1  terms  for  aa  m  by  m  matrix  so  a  reasonable  upper  bound  on  the  cost  is  known  a 
priori:  (m*/ 24  +  lower  order  terms)  scalar  multiplications.  The  price  paid  for  this 
insurance  is  that  the  universal  coefficients  l/k\  are  replaced  by  problem  dependent  divided 
differences  7*.  We  have  made  a  careful  study  of  this  problem  in  [3]  and  the  conclusion  is 
that,  for  exp,  the  whole  set  of  coefficients  can  be  evaluated  with  high  relative  accuracy  in 

0(m2)  scalar  operations.  The  constant  in  0  is  between  -i-  and  601og2m.  No  extra  storage 

is  required. 

[d|  •  Computation  of  divided  differences. 

It  turns  out  that  the  form  the  top  row  of  exp(Z),  where  Z  is  obtained  from  the  given  tri¬ 
angular  matrix  T  by  keeping  the  diagonal  unchanged,  putting  l's  just  above  it,  and  putting 
to  zero  all  the  elements  above  the  l’s.  Thus  Z  is  a  bidiagonal  matrix.  In  practice  we  prefer 
to  compute  scaled  divided  differences  7t(/fc-l)!  and  to  scale  down  the  partial  products.  See 
[3]  for  details. 
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8.  Usage 

The  user  is  supposed  to  have  an  n  Xn  complex  matrix  B,  a  complex  r,  and  an  nXk  com¬ 
plex  matrix  V.  The  task  is  to  compute  the  n  X  k  matrix  U  given  by 

U  =  exp(rB)V. 

By  choosing  V=I,  the  identity  matrix,  and  k=n,  the  output  will  be  exp(rB).  The  recomputa¬ 
tion  of  U  when  r  and  V  are  changed  is  easy  and  quick. 

In  order  to  invoke  the  subprograms  the  user  must  provide  four  2-dimensional  complex 
arrays: 

B,P ,E  (each  n  X  n),  and  K(nX  k). 

In  addition  the  program  needs  one  real  array  w  of  dimension  lOn  and  two  integer  arrays  ip  and 
idx  of  dimension  n .  At  the  end  B  will  have  been  changed  to  S  and  V  will  have  been  overwritten 
with  V . 


In  order  to  obtain  U  four  subroutines  must  be  called  in  the  proper  order. 


CALL  CSCHUR  (B,P,low,igh,w,n,nb,np,ierr) 

CALL  CORDER  (B ,P ,r,w ,n  ,nb ,np) 

CALL  CEXPHY  (B,E,w ,ip  ,idx,r,ovft  ,ierr  ,n,nb  ,ne) 

CALL  CFMULV  (E,P,  V,w,n  ,nb  ,ne  ,np  ,nv,k) 

Here  low .igh ,np  ,nb  ,ne  ,nv ,n ,k ,ierr  are  integer  variables,  ovft  is  the  overflow  threshold. 

The  recomputation  of  exp(rB)F  with  new  values  for  r  and  V  is  particularly  efficient 
(approximately  5 ns  for  each  r).  The  calling  sequence  is  as  follows. 


CALL  CSCHUR  {B ,P  ,low ,igh  ,w ,n  ,nb ,np  ,ierr) 

do  10  i=l,  enough 
Get  r 

CALL  CORDER  (B ,P ,r,w ,n ,nb ,np) 

CALL  CEXPHY  (B  ,E  ,w  ,ip  ,idx  ,T,ovft  ,ierr  ,n  ,nb  ,ne) 
Get  V 

CALL  CFMULV  {E,P,  V ,w ,n  ,nb ,ne ,np ,nv ,k) 

10  CONTINUE 
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13.  ABSTRAC 


This  document  describes  a  program  to  compute  the  exponential  of  a  given  n  by  n 
matrix  B  multiplied  by  a  scalar  x  that  is  to  be  thought  of  as  representing  time.  Our 
primary  goal  has  been  to  achieve  as  much  accuracy  as  working  precision  permits  with¬ 
out  resorting  to  simulated  higher  precision.  The  final  product  is  more  complicated 
than  we  anticipated  at  the  outset.  How  these  complications  came  to  be  accepted  is 
the  theme  of  this  story.  The  cases  we  consider  may  be  of  interest  to  those  who  wish 
to  use  the  matrix  exponential  in  their  work.  We  hasten  to  add  that  the  code  acts 
simply  on  simple  cases. 

Our  code  is  unlikely  to  be  the  last  word  on  the  computation  of  exp(Bt)  since 
speed  and  simplicity  deserve  some  recognition.  For  simplicity  and  generality  we 
consider  only  complex  matrices  and  only  those  whose  order  n  is  small  enough  that 
four  n  by  n  arrays  fit  comfortably  in  the  fast  store.  In  practice  we  expect  to 
have  n  <  100.  Recall  that  unless  B  has  special  structure,  such  as  being  block 
diagonal,  its  exponential  will  have  no  zero  elements. 

The  rest  of  the  paper  is  organized  in  the  way  indicated  in  the  table  of  contents. 
We  use  standard  conventions:  capital  letters  for  matrices,  lower  case  for  column 
vectors,  and  lower  case  Greek  for  scalars.  In  particular  we  switch  to  x  for  time 
instead  of  t. 
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Security  Classification 


The  designer  of  a  program  to  invert  a  matrix  has  to  decide  what  action  should 
be  taken  when  a  given  matrix  is  close  to  singular.  Should  an  indication  of  sensi¬ 
tivity  be  given  along  with  some  output?  Although  the  exponential  of  a  matrix  is 
always  defined  it  may  be  exceedingly  sensitive  to  tiny  perturbations  of  the  original 
matrix.  Ideally  a  program  such  as  ours  should  compute  some  measure  of  the  sensi¬ 
tivity  of  the  data.  We  have  such  a  facility  but,  for  the  sake  of  brevity,  we  shall 
present  it  in  another  communication. 
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Legal  Notice 

This  report  was  prepared  as  an  account  of  work  sponsored  by 
the  Center  for  Pure  and  Applied  Mathematics.  Neither  the 
Center  nor  the  Department  of  Mathematics,  makes  any  warranty 
expressed  or  implied,  or  assumes  any  legal  liability  or 
responsibility  for  the  accuracy,  completeness  or  usefulness 
of  any  information  or  process  disclosed. 
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